Lately I’ve been thinking a lot about data portability in terms of API’s, microformats and standards like SIOC, FOAF, OWL and APML. What fascinates me is that data portability enables people to share and move their data around the web seamlessly. I wrote about some microformats and standards for the semantic web a few days ago, but what about the API’s?

Nowadays, an API most often comes in the form of a REST service that returns JSON or XML, so that both JavaScript and server-side technologies can use them. It makes sense and it promote data portability. Then think about a website. Some parts of it are tagged up with various microformats like XFN, hCard and hCalendar, but what about the rest of the content?

My idea is to render any webpage in either XML or JSON if a certain parameter is added to the URL. It could look like this:

www.example.com/about.aspx?format=xml

Then the about.aspx page would render the output as XML for machines to read and thereby have some sort of read-only API. The standard could be RSS or ATOM or both. Now any machine can read the content of your website easily. Think about how easy it must be for search engines to index an RSS representation of a webpage instead of going through all the excess mark-up and style information.

If any webpage could be served as RSS, ATOM and JSON then the content of that page could be consumed in a multitude of ways. The best thing is that it is relatively easy to implement. Because it is so easy I can’t stop wondering if there is a reason no one has done it before.

The question is who would use this and does it make any sense to begin with?

You have a Facebook profile with a lot of connected friends. You share your interests and personal information with those friends. You also have a LinkedIn profile with just as many connected contacts with whom you share your job related information. All your pictures are in the hands of Flickr and your Twitter account tells your friends what you are up to on an hourly basis. Your mobile phone’s phonebook is handled by ZYB and all your personal videos are located at YouTube. All your favourite links are available from del.icio.us and your appointments are controlled by Google Calendar.

Does this sound familiar? If it does then here are some questions for you.

Think about these questions

  • Why maintain your profile information on multiple sites?
  • Who owns all this information?
  • Are you in charge of your online identity?

If you’re like me, you have a hard time answering those questions. Most people probably don’t think of this, but you are living the online connected life and should have an opinion about this. Not because of privacy issues, but because you know there must be an easy answer to the questions. You just can’t find them and it might bother you. So where do you go from here?

The semantic web

Ok, so you’ve heard about the semantic web but what does it mean in relation to all this? This is the real question and it will become the answer to all the previous questions.

At the moment we have billions of web pages floating around as islands in the cloud. All the information on those web pages is isolated and doesn’t give any meaning to computers. You have to be a human to understand the meaning of even the simplest web page. If it contains an address or information about an upcoming event you have to be a human to understand that. The simple web page might also contain information about your interests and a review of movie you just saw. If you have a blog then you probably also have a blogroll where you list the websites of your friends. Still, you have to be a human to understand that.

That’s where the semantic web kicks in. It knows about all this information and makes it searchable in a way that Google can only dream about. All you need to do is to adhere to some standards when you write information like addresses, calendar events, interests, friends etc. Standards like hCard, hCalendar, APML, SIOC, XFN and FOAF. These standards are machine readable and can be adopted by any website without changing the design and layout. I suggest you take a look at each of the standards if you haven’t heard of them before. Trust me, it’s damn easy to implement.

Now, imaging that you have a single web page that contains all this information in a machine readable way. This will become your central location of your online identity. Because it is written in a machine readable way, all the social networking sites you use can consume this information automagically.

The standards for the semantic web are ready for you to use. Three things are still missing for us to achieve our goal of a centralized online life and they are the key to success.

The final three steps

Step one: In order to have a centralized online life from where all information is spread and maintained across sites and social networks, the first thing we need is a web page suitable for this. This is where things get tricky. It has to be a page that you and only you own and control. A personal website is the natural choice, but then you need to know how to mark up your information according to the various standards and that is not something that the average Facebook user is capable of creating.

There could also be services that allow you to maintain this centralized profile for you. Facebook could be one such service, but then you still wouldn’t really know who owns your data – you or Facebook. No, as I see it, only a personal website puts you in total control. Hang on, we’ll get there.

Let’s take it to the second step and assume you have a personal website that holds your personal information only you control.

Step two: Social networking sites have to be capable of consuming the information from your personal website – your secure centralized information location. This will only happen if enough people have a centralized location and, by that end, the demand is great enough for the sites to implement support for it. This will not happen just by a few people implement the standards on their website. We need something more that will drive a much bigger user base.

We need all the personal CMS and blog platform vendors to step up to the plate and provide this functionality out-of-the-box. Average Joe should not need to know about the inner workings of these standards, he just need to know that when he enters his personal information it will work in a cross-site manor. This is what he should think after he entered his information on his central location and visits Facebook to create an account:

I’ve already given the Internet my information once, why should I do it again? Here is my website URL, which should be enough for you to gather what you need in order to create my profile.

Step three: All this should be achieved by only having one set of credentials no matter where you sign up or sign in. OpenID takes care of that in a beautiful way. You sign in to your central location using your OpenID credentials and stay logged in whenever you visit LinkedIn, Twitter etc. The scenario now becomes very smooth and transparent.

The perfect scenario

Sign in to your personal website using OpenID and fill in your profile, interests, contact information, list your friends and calendar events. Some of this you synchronize with Outlook, Messenger etc. Now you have an only identity that you control and own.

You have heard about Flickr so you visit the site and sign up with the same OpenID credentials. Then you start uploading your photos. Flickr automagically knows about your friends, so you can tag all of your pictures with the friends you already entered and they will be notified by e-mail even though they don’t have a Flickr account.

Now you move on to Facebook to sign up. Facebook automatically connect you to the friends you have entered on your personal website and they import the photos from Flickr as well. It also import your interests, favourite movies etc. and adds it to your profile.

Then you find a girlfriends/boyfriend and update your status from single to in a relationship on your personal website. That status then automagically updates in your Facebook account without you need to do anything. You also signs up to ZYB and from there you change your phone number. That information is then pushed back to your personal website and updated – probably with you having to approve it.

The circle is now complete. Data flows to and from your central location. You are in total control all the time from a centralized location you own. Information is shared cross multiple websites and is always updated.

Start the revolution

To start implementing the above scenario, we as developers need to be pro-active. This is not something that will happen because Tim O’Reilly, Steve Ballmer or anyone else says so. Actually they already have but it stays in limbo.

It begins by you, the visionary developer, starts to implement one or more of the standards on websites you build. It starts by you convincing your boss to implement it in your line of CMS products. It starts by you influencing decisions makers at your job.

It starts by you. And me. All developers.

The standards are ready to be used. Open source components and libraries have already been developed to let you consume these different standards on various platforms like ASP.NET, PHP and Ruby. Do a quick web search and you'll see. In other words, both ends of the equations already exist – the client and the server technologies and standards.

An easy way to start is to implement microformats into existing websites in the form of contacts, calendar events, interests and friends are tagged up and ready to be consumed by social networks. Next you can think about implementing FOAF which provide more details to your friend list. APML will more detailed describe your interests and SIOC can describe how you and your contacts interact with each other on the web. I would go by doing it in that order.

This is just thin air coming from me if I didn’t believe so much in it. That’s why the next version of BlogEngine.NET will start incorporating this vision. Remember, we developers are the ones it boils down to about creating the semantic web. We are the ones that have to take responsibility in driving the web forward like no one else can.